1 Introduction

1.1 Biological background and motivation

Immune molecules such as B and T cell receptors, human leukocyte antigens (HLAs) or killer Ig-like receptors (KIRs) are encoded in the genetically most diverse loci of the human genome. Many of these immune genes are hyperpolymorphic, showing high allelic diversity across human populations. In addition, typical immune molecules are polygenic, which means that multiple functionally similar genes encode the same protein subunit.

We have developed a workflow, that allows quantification and interactive exploration of donor-specific alleles of different immune genes. The workflow is composed of three steps: 1. Typing of donor-specific alleles, 2. Quantification of these alleles in single-cell transcriptomic data, 3. exploration of single-cell transcriptomic data using SingleCellAlleleExperiment (SCAE) class (Fig. 1).

The aim of the here presented SingleCellAlleleExperiment (SCAE) class is to provide an addition to the SingleCellExperiment object that can simultaneously contain quantification of alleles, genes, and groups of functionally similar genes and thus allows data analysis across these immunologically relevent different layers of annotation.

1.2 The SingleCellAlleleExperiment (SCAE) class

The SingleCellAlleleExperiment (SCAE) class is a container for storing and handling allele-aware quantification data for immune genes. The SCAE class is derived from the SingleCellExperiment(SCE) class and uses the same overall object architecture. However, multiple data layers are integrated into the object during object generation. Data from a novel allele-aware quantification method contains information about alleles for immune genes. During object generation, the allele information is aggregated into two additional data layers via an ontology based MHC design principle and appended to the initial raw data using a lookup table. Thus the final SCAE object contains non-immune genes, alleles for immune genes, an immune gene layer and a functional class layer (refer to Figure 1).

For example, the counts of the alleles A*01:01:01:01 and A*02:01:01:01 that are present in the raw input data will be summed up and transformed into the HLA-A immune gene layer. Next, all counts for the present HLA-class I immune genes will be summed up and transformed into the HLA-class I functional class layer. The information necessary to perform these transformations is saved in a lookup table retrieved from the IPD-IMGT/HLA database.

The implemented object follows similar conventions like the SCE class, where rows should represent features (genes, transcripts) and columns should represent cells. Established single cell packages like scater and scran can be used with the SCAE object to perform downstream analysis on immune gene expression.

This allows new insights on functional as well as allele level to uncover the high diversity of immune genes.

alt text here Figure 1: Scheme of SingleCellAlleleExperiment object structure with lookup table.

1.3 Expected input

The read in function of the SCAE package readAlleleCounts expects specific files and file identifiers. The stated input directory should contain the following files:

  • cells_x_genes.barcodes.txt (list of barcodes/cell identifiers)
  • cells_x_genes.features.txt (list of feature identifiers)
  • cells_x_genes.mtx (Contains the quantification matrix)
  • lookup_table.csv (Info for generating multiple data layers)

The used dataset for later downstream analysis and testing of the functionalities of the multi-layer object is a dual-center, two-cohort study where whole blood and peripheral blood mononuclear cells underwent scRNA-sequencing. The whole transcriptome dataset is under controlled access and not for public use. The corresponding publication Severe COVID-19 Is Marked by a Dysregulated Myeloid Cell Compartment can be found here.

2 Loading packages

The following packages are abundant for performing the downstream analysis and visualization.

library(SingleCellAlleleExperiment)
library(scran)
library(scater)
library(gridExtra)
library(tidyverse)
library(patchwork)
library(cowplot)
library(ggplot2)
library(viridis)

3 Reading in data and quality check

3.1 Stating input directory containing the expected files

First the directory path is stated. The directory should contain all expected files.

dir_path <- "../inst/extdata"

list.files("../inst/extdata")
## [1] "cells_x_genes.barcodes.txt" "cells_x_genes.genes.txt"   
## [3] "cells_x_genes.mtx"          "intro.html"                
## [5] "lookup_table_HLA_only.csv"  "scae_advanced.png"

3.2 Generate SingleCellAlleleExperiment object

SCAE objects implements the same object architecture as SCE objects. When reading allele counts, the method by default filters cells from a knee-plot inflection point. Alternatively, manual filter settings can be provided.

scae <- readAlleleCounts(dir_path,
                         sample.names = "EGA_WTA",
                         filter = TRUE,
                         exp_type = "WTA",
                         symbols = "biomart")
## Warning: Removed 2555 rows containing missing values (`geom_line()`).

## Filtering performed based on the inflection point at:  105  UMI counts.[1] "Runtime check (1/2) Read_in: 3.76858973503113"
## [1] "     Generating SCAE (1/5) extending rowData: 4.0276985168457"
## [1] "     Generating SCAE (2/5) filtering and normalization: 0.13380765914917"
## [1] "     Generating SCAE (3/5) alleles2genes: 1.39652752876282"
## [1] "     Generating SCAE (4/5) genes2functional: 1.38803172111511"
## [1] "     Generating SCAE (5/5) log_transform: 1.38803172111511"
## [1] "Runtime check (2/2) Generating SCAE completed: 7.30348300933838"
## [1] "Total runtime, completed read_in, filtering and normalization and generating scae object 11.0722854137421"
scae
## class: SingleCellAlleleExperiment 
## dim: 62718 7501 
## metadata(0):
## assays(2): counts logcounts
## rownames(62718): ENSG00000160072.20 ENSG00000279928.2 ... HLA_class_I
##   HLA_class_II
## rowData names(4): Ensembl.ID Symbol NI_I Quant_type
## colnames(7501): ATACTTAGGACAGTGGTAGGCTACAGA CAGAATCGTGAGTGCGAAGGTGAGTTA
##   ... ACACACAAATTGCGTACATTCAGCTCA ATCAGAGCTGTACATCTACACGATCCG
## colData names(3): Sample Barcode sizeFactor
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):

3.2.1 RowData slot

Two new classification columns are introduced in the rowData slot. Namely the NI_I column and Quant_type column. Both columns are used to identify each row of the object to its corresponding data layer (see figure 1).

rowData(scae)
## DataFrame with 62718 rows and 4 columns
##                            Ensembl.ID       Symbol        NI_I  Quant_type
##                           <character>  <character> <character> <character>
## ENSG00000160072.20 ENSG00000160072.20       ATAD3B          NI           G
## ENSG00000279928.2   ENSG00000279928.2     DDX11L17          NI           G
## ENSG00000228037.1   ENSG00000228037.1           NA          NI           G
## ENSG00000142611.17 ENSG00000142611.17       PRDM16          NI           G
## ENSG00000284616.1   ENSG00000284616.1           NA          NI           G
## ...                               ...          ...         ...         ...
## HLA-DQA1                     HLA-DQA1     HLA-DQA1           I           G
## HLA-DQB1                     HLA-DQB1     HLA-DQB1           I           G
## HLA-DPB1                     HLA-DPB1     HLA-DPB1           I           G
## HLA_class_I               HLA_class_I  HLA_class_I           I           F
## HLA_class_II             HLA_class_II HLA_class_II           I           F

3.2.2 ColData slot

As the object extends the count matrix during the object generation, its abundant to compute scaling factors on the raw data prior to extending and integrating the data layers. The scaling factors are used for scaling normalization in a later step of the SCAE constructor.

colData(scae)
## DataFrame with 7501 rows and 3 columns
##                                  Sample                Barcode sizeFactor
##                             <character>            <character>  <numeric>
## ATACTTAGGACAGTGGTAGGCTACAGA     EGA_WTA ATACTTAGGACAGTGGTAGG..   3.491050
## CAGAATCGTGAGTGCGAAGGTGAGTTA     EGA_WTA CAGAATCGTGAGTGCGAAGG..   0.493435
## AGCCATCACACCTGAGTCCAATACAAG     EGA_WTA AGCCATCACACCTGAGTCCA..   2.328737
## CTTCACATAGTGCAGTAATACTTCGGA     EGA_WTA CTTCACATAGTGCAGTAATA..   2.316401
## ATGTAATGGCCGTATCTAACTGAATTC     EGA_WTA ATGTAATGGCCGTATCTAAC..   0.190521
## ...                                 ...                    ...        ...
## GCAATCCGATGTGAAGAAAGCTATGTG     EGA_WTA GCAATCCGATGTGAAGAAAG..   0.581156
## GAGCCAATAAGCGACACCAACGTGTGA     EGA_WTA GAGCCAATAAGCGACACCAA..   1.894241
## CACACACTAATCCTAGGACAAACGTGG     EGA_WTA CACACACTAATCCTAGGACA..   2.538447
## ACACACAAATTGCGTACATTCAGCTCA     EGA_WTA ACACACAAATTGCGTACATT..   0.150772
## ATCAGAGCTGTACATCTACACGATCCG     EGA_WTA ATCAGAGCTGTACATCTACA..   0.149401

3.2.3 Extract information from the different data layers

Additionally to the established getters from the SCE package, new getters are implemented to retrieve the different data layers integrated in the SCAE object.


3.2.3.1 Non-immune genes

get_nigenes(scae)
## class: SingleCellAlleleExperiment 
## dim: 62695 7501 
## metadata(0):
## assays(2): counts logcounts
## rownames(62695): ENSG00000160072.20 ENSG00000279928.2 ...
##   ENSG00000277475.1 ENSG00000275405.1
## rowData names(4): Ensembl.ID Symbol NI_I Quant_type
## colnames(7501): ATACTTAGGACAGTGGTAGGCTACAGA CAGAATCGTGAGTGCGAAGGTGAGTTA
##   ... ACACACAAATTGCGTACATTCAGCTCA ATCAGAGCTGTACATCTACACGATCCG
## colData names(3): Sample Barcode sizeFactor
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
head(rownames(get_nigenes(scae)))
## [1] "ENSG00000160072.20" "ENSG00000279928.2"  "ENSG00000228037.1" 
## [4] "ENSG00000142611.17" "ENSG00000284616.1"  "ENSG00000157911.11"


3.2.3.2 Alleles

get_alleles(scae)
## class: SingleCellAlleleExperiment 
## dim: 14 7501 
## metadata(0):
## assays(2): counts logcounts
## rownames(14): A*02:01:01:01 A*26:01:01:01 ... DPB1*10:01:01:01
##   DPB1*13:01:01:01
## rowData names(4): Ensembl.ID Symbol NI_I Quant_type
## colnames(7501): ATACTTAGGACAGTGGTAGGCTACAGA CAGAATCGTGAGTGCGAAGGTGAGTTA
##   ... ACACACAAATTGCGTACATTCAGCTCA ATCAGAGCTGTACATCTACACGATCCG
## colData names(3): Sample Barcode sizeFactor
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
head(rownames(get_alleles(scae)))
## [1] "A*02:01:01:01" "A*26:01:01:01" "B*51:01:01:01" "B*57:01:01:01"
## [5] "C*04:01:01:01" "C*06:02:01:01"


3.2.3.3 Immune genes

get_agenes(scae)
## class: SingleCellAlleleExperiment 
## dim: 7 7501 
## metadata(0):
## assays(2): counts logcounts
## rownames(7): HLA-A HLA-B ... HLA-DQB1 HLA-DPB1
## rowData names(4): Ensembl.ID Symbol NI_I Quant_type
## colnames(7501): ATACTTAGGACAGTGGTAGGCTACAGA CAGAATCGTGAGTGCGAAGGTGAGTTA
##   ... ACACACAAATTGCGTACATTCAGCTCA ATCAGAGCTGTACATCTACACGATCCG
## colData names(3): Sample Barcode sizeFactor
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
head(rownames(get_agenes(scae)))
## [1] "HLA-A"    "HLA-B"    "HLA-C"    "HLA-DRB1" "HLA-DQA1" "HLA-DQB1"


3.2.3.4 Functional class

get_func(scae)
## class: SingleCellAlleleExperiment 
## dim: 2 7501 
## metadata(0):
## assays(2): counts logcounts
## rownames(2): HLA_class_I HLA_class_II
## rowData names(4): Ensembl.ID Symbol NI_I Quant_type
## colnames(7501): ATACTTAGGACAGTGGTAGGCTACAGA CAGAATCGTGAGTGCGAAGGTGAGTTA
##   ... ACACACAAATTGCGTACATTCAGCTCA ATCAGAGCTGTACATCTACACGATCCG
## colData names(3): Sample Barcode sizeFactor
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
head(rownames(get_func(scae)))
## [1] "HLA_class_I"  "HLA_class_II"


3.2.3.5 Unknown alleles

In case the raw input data contains allele information that is not present in the lookup table, these alleles will be classified accordingly in the Quant_type classification column in the colData slot and can be retrieved with the get_unknown() getter. These alleles have potentially invalid identifiers and will not be considered in further generated data-layers.

get_unknown(scae)
## class: SingleCellAlleleExperiment 
## dim: 0 7501 
## metadata(0):
## assays(2): counts logcounts
## rownames(0):
## rowData names(4): Ensembl.ID Symbol NI_I Quant_type
## colnames(7501): ATACTTAGGACAGTGGTAGGCTACAGA CAGAATCGTGAGTGCGAAGGTGAGTTA
##   ... ACACACAAATTGCGTACATTCAGCTCA ATCAGAGCTGTACATCTACACGATCCG
## colData names(3): Sample Barcode sizeFactor
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
head(rownames(get_unknown(scae)))
## character(0)

4 Expression evaluation

Checking the expression for the allele-layer, immune gene layer and functional class layer. Allele identifiers are in the form of A*02:01:01:01. The immune genes are in the form of HLA-A and the functional classes HLA_class_I. Here we see that HLA-class I and HLA-C are the most abundant functional class and immune gene respectively, given the underlying dataset.

all_alleles <- c(rownames(get_alleles(scae)),
                 rownames(get_agenes(scae)),
                 rownames(get_func(scae)))

plotExpression(scae, all_alleles)


5 Downstream analysis

In the following sections, main steps for dimensional reduction are performed, offering insights into the different data layers of the SCAE object as well giving an idea on how to perform immune gene expression analysis.

5.1 Subsetting the different layers

The non-imune genes are combined with each of the integrated immune gene allele-aware layers to determine three different subsets.

5.1.1 Non-immune genes + alleles

redim_scae <- scae

ni_a <- redim_scae[c(rownames(get_nigenes(redim_scae)), rownames(get_alleles(redim_scae))), ]

ni_a
## class: SingleCellAlleleExperiment 
## dim: 62709 7501 
## metadata(0):
## assays(2): counts logcounts
## rownames(62709): ENSG00000160072.20 ENSG00000279928.2 ...
##   DPB1*10:01:01:01 DPB1*13:01:01:01
## rowData names(4): Ensembl.ID Symbol NI_I Quant_type
## colnames(7501): ATACTTAGGACAGTGGTAGGCTACAGA CAGAATCGTGAGTGCGAAGGTGAGTTA
##   ... ACACACAAATTGCGTACATTCAGCTCA ATCAGAGCTGTACATCTACACGATCCG
## colData names(3): Sample Barcode sizeFactor
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):


5.1.2 Non-immune genes + immune genes

ni_g <- redim_scae[c(rownames(get_nigenes(redim_scae)), rownames(get_agenes(redim_scae))), ]

ni_g
## class: SingleCellAlleleExperiment 
## dim: 62702 7501 
## metadata(0):
## assays(2): counts logcounts
## rownames(62702): ENSG00000160072.20 ENSG00000279928.2 ... HLA-DQB1
##   HLA-DPB1
## rowData names(4): Ensembl.ID Symbol NI_I Quant_type
## colnames(7501): ATACTTAGGACAGTGGTAGGCTACAGA CAGAATCGTGAGTGCGAAGGTGAGTTA
##   ... ACACACAAATTGCGTACATTCAGCTCA ATCAGAGCTGTACATCTACACGATCCG
## colData names(3): Sample Barcode sizeFactor
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):


5.1.3 Non-immune genes + functional class

ni_f <- redim_scae[c(rownames(get_nigenes(redim_scae)), rownames(get_func(redim_scae))), ]

ni_f
## class: SingleCellAlleleExperiment 
## dim: 62697 7501 
## metadata(0):
## assays(2): counts logcounts
## rownames(62697): ENSG00000160072.20 ENSG00000279928.2 ... HLA_class_I
##   HLA_class_II
## rowData names(4): Ensembl.ID Symbol NI_I Quant_type
## colnames(7501): ATACTTAGGACAGTGGTAGGCTACAGA CAGAATCGTGAGTGCGAAGGTGAGTTA
##   ... ACACACAAATTGCGTACATTCAGCTCA ATCAGAGCTGTACATCTACACGATCCG
## colData names(3): Sample Barcode sizeFactor
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):

5.2 Dimensional Reduction

5.2.1 Model variance and HVGs for all data layers

Using the modelGeneVar() function prior to getTopHVGs. Both functions are part from the scran package.

df_ni_a <- modelGeneVar(ni_a)

head(df_ni_a)
## DataFrame with 6 rows and 6 columns
##                           mean       total        tech          bio    p.value
##                      <numeric>   <numeric>   <numeric>    <numeric>  <numeric>
## ENSG00000160072.20 3.14134e-03 4.40929e-03 3.55075e-03  8.58536e-04 0.00224164
## ENSG00000279928.2  8.13804e-05 4.96774e-05 9.19867e-05 -4.23093e-05 0.99999997
## ENSG00000228037.1  1.04811e-04 8.24006e-05 1.18471e-04 -3.60701e-05 0.99982732
## ENSG00000142611.17 2.89372e-04 2.24889e-04 3.27086e-04 -1.02197e-04 0.99987990
## ENSG00000284616.1  0.00000e+00 0.00000e+00 0.00000e+00  0.00000e+00        NaN
## ENSG00000157911.11 1.63798e-03 1.85760e-03 1.85146e-03  6.14107e-06 0.48445060
##                          FDR
##                    <numeric>
## ENSG00000160072.20 0.0191929
## ENSG00000279928.2  1.0000000
## ENSG00000228037.1  1.0000000
## ENSG00000142611.17 1.0000000
## ENSG00000284616.1        NaN
## ENSG00000157911.11 1.0000000
df_ni_g <- modelGeneVar(ni_g)

head(df_ni_g)
## DataFrame with 6 rows and 6 columns
##                           mean       total        tech          bio    p.value
##                      <numeric>   <numeric>   <numeric>    <numeric>  <numeric>
## ENSG00000160072.20 3.14134e-03 4.40929e-03 3.56159e-03  8.47700e-04 0.00210309
## ENSG00000279928.2  8.13804e-05 4.96774e-05 9.22675e-05 -4.25901e-05 0.99999999
## ENSG00000228037.1  1.04811e-04 8.24006e-05 1.18832e-04 -3.64317e-05 0.99988648
## ENSG00000142611.17 2.89372e-04 2.24889e-04 3.28084e-04 -1.03195e-04 0.99992239
## ENSG00000284616.1  0.00000e+00 0.00000e+00 0.00000e+00  0.00000e+00        NaN
## ENSG00000157911.11 1.63798e-03 1.85760e-03 1.85711e-03  4.90883e-07 0.49873188
##                          FDR
##                    <numeric>
## ENSG00000160072.20 0.0180077
## ENSG00000279928.2  1.0000000
## ENSG00000228037.1  1.0000000
## ENSG00000142611.17 1.0000000
## ENSG00000284616.1        NaN
## ENSG00000157911.11 1.0000000
df_ni_f <- modelGeneVar(ni_f)

head(df_ni_f)
## DataFrame with 6 rows and 6 columns
##                           mean       total        tech          bio    p.value
##                      <numeric>   <numeric>   <numeric>    <numeric>  <numeric>
## ENSG00000160072.20 3.14134e-03 4.40929e-03 3.57652e-03  8.32770e-04 0.00233094
## ENSG00000279928.2  8.13804e-05 4.96774e-05 9.26542e-05 -4.29769e-05 0.99999999
## ENSG00000228037.1  1.04811e-04 8.24006e-05 1.19330e-04 -3.69298e-05 0.99991529
## ENSG00000142611.17 2.89372e-04 2.24889e-04 3.29459e-04 -1.04570e-04 0.99994261
## ENSG00000284616.1  0.00000e+00 0.00000e+00 0.00000e+00  0.00000e+00        NaN
## ENSG00000157911.11 1.63798e-03 1.85760e-03 1.86490e-03 -7.29427e-06 0.51895488
##                          FDR
##                    <numeric>
## ENSG00000160072.20 0.0199549
## ENSG00000279928.2  1.0000000
## ENSG00000228037.1  1.0000000
## ENSG00000142611.17 1.0000000
## ENSG00000284616.1        NaN
## ENSG00000157911.11 1.0000000

Plotting the variance for each data layer.

#plot variance
par(mfrow=c(1,3))
fit_ni_a <- metadata(df_ni_a)
plot(fit_ni_a$mean, fit_ni_a$var,
     xlab = "mean of log-expression", ylab = "variance of log-expression", main = "NI-A")
points(fit_ni_a$mean, fit_ni_a$var, col = "red", pch = 16)
curve(fit_ni_a$trend(x), col = "dodgerblue", add = TRUE, lwd = 2)

fit_ni_g <- metadata(df_ni_g)
plot(fit_ni_g$mean, fit_ni_g$var,
     xlab = "mean of log-expression", ylab = "variance of log-expression", main = "NI-G")
points(fit_ni_g$mean, fit_ni_g$var, col = "red", pch = 16)
curve(fit_ni_g$trend(x), col = "dodgerblue", add = TRUE, lwd = 2)

fit_ni_f <- metadata(df_ni_f)
plot(fit_ni_f$mean, fit_ni_f$var,
     xlab = "mean of log-expression", ylab = "variance of log-expression", main = "NI-F")
points(fit_ni_f$mean, fit_ni_f$var, col = "red", pch = 16)
curve(fit_ni_f$trend(x), col = "dodgerblue", add = TRUE, lwd = 2)

par(mfrow=c(1,1))

Compute a list of HVGs for each data layer. Return the top 0.1 % HVGs per layer using getTopHVGs.

top_ni_a <- getTopHVGs(df_ni_a, prop = 0.1)
length(top_ni_a)
## [1] 1090
head(top_ni_a, n = 15)
##  [1] "ENSG00000244734.4"  "ENSG00000188536.13" "ENSG00000206172.8" 
##  [4] "ENSG00000143546.10" "ENSG00000115523.17" "ENSG00000197746.15"
##  [7] "ENSG00000090382.7"  "ENSG00000163220.11" "ENSG00000147454.14"
## [10] "ENSG00000019582.17" "C*06:02:01:01"      "ENSG00000087086.15"
## [13] "ENSG00000162722.9"  "ENSG00000158869.11" "C*04:01:01:01"
top_ni_g <- getTopHVGs(df_ni_g, prop = 0.1)
length(top_ni_g)
## [1] 1075
head(top_ni_g, n = 15)
##  [1] "ENSG00000244734.4"  "ENSG00000188536.13" "ENSG00000206172.8" 
##  [4] "ENSG00000143546.10" "ENSG00000115523.17" "ENSG00000197746.15"
##  [7] "ENSG00000090382.7"  "ENSG00000163220.11" "ENSG00000147454.14"
## [10] "ENSG00000019582.17" "ENSG00000087086.15" "ENSG00000162722.9" 
## [13] "ENSG00000158869.11" "ENSG00000223609.11" "ENSG00000211592.8"
top_ni_f <- getTopHVGs(df_ni_f, prop = 0.1)
length(top_ni_f)
## [1] 1058
head(top_ni_f, n = 15)
##  [1] "ENSG00000244734.4"  "ENSG00000188536.13" "ENSG00000206172.8" 
##  [4] "ENSG00000143546.10" "ENSG00000115523.17" "ENSG00000197746.15"
##  [7] "ENSG00000090382.7"  "ENSG00000163220.11" "ENSG00000147454.14"
## [10] "ENSG00000019582.17" "ENSG00000087086.15" "ENSG00000162722.9" 
## [13] "ENSG00000158869.11" "ENSG00000223609.11" "ENSG00000211592.8"

5.2.2 PCA

Compute PCA for each layer and store the results in the object. Its Important to make unique identifiers for each layer/run or the results will be overwritten and just saved as PCA. Here, the runPCA functions from the scater package is used.

set.seed(18)
redim_scae <- runPCA(redim_scae, ncomponents = 10, subset_row = top_ni_a, exprs_values = "logcounts", name = "PCA_a")
redim_scae <- runPCA(redim_scae, ncomponents = 10, subset_row = top_ni_g, exprs_values = "logcounts", name = "PCA_g")
redim_scae <- runPCA(redim_scae, ncomponents = 10, subset_row = top_ni_f, exprs_values = "logcounts", name = "PCA_f")
reducedDimNames(redim_scae)
## [1] "PCA_a" "PCA_g" "PCA_f"

5.2.3 t-SNE

The same goes for running t-SNE with the runTSNE function from the scater package. Unique identifiers are stated here for each layer as well.

set.seed(18)
redim_scae <- runTSNE(redim_scae, dimred= "PCA_a", perplexity = 50,  name = "TSNE_a_50")
set.seed(18)
redim_scae <- runTSNE(redim_scae, dimred= "PCA_g", perplexity = 50,  name = "TSNE_g_50")
set.seed(18)
redim_scae <- runTSNE(redim_scae, dimred= "PCA_f", perplexity = 50,  name = "TSNE_f_50")

List of results from the performed reduced dimension analysis.

reducedDimNames(redim_scae)
## [1] "PCA_a"     "PCA_g"     "PCA_f"     "TSNE_a_50" "TSNE_g_50" "TSNE_f_50"

5.3 Visualization

Exemplary visualization for the t-SNE results on gene level for immune genes that relate to HLA-class I. In the given dataset, these are the immune genes HLA-A, HLA-B and HLA-C plotted alongside their alleles. This allows for insights into potential genetic differences shown on allele-level.


5.3.1 HLA-A immune gene and alleles

#determining which t-SNE (perplexity) to choose for all visualizations
which_tsne <- "TSNE_g_50"

#EGA_hla_a_alleles
tsne_g_a  <- plotReducedDim(redim_scae, dimred = which_tsne, colour_by = "HLA-A") + ggtitle("HLA-A gene")
tsne_g_a1 <- plotReducedDim(redim_scae, dimred = which_tsne, colour_by = "A*02:01:01:01") + ggtitle("Allele A*02:01:01:01")
tsne_g_a2 <- plotReducedDim(redim_scae, dimred = which_tsne, colour_by = "A*26:01:01:01") + ggtitle("Allele A*26:01:01:01")

p2 <- tsne_g_a + tsne_g_a1 + tsne_g_a2

p2


5.3.2 HLA-B immune gene and alleles

tsne_g_b  <- plotReducedDim(redim_scae, dimred = which_tsne, colour_by = "HLA-B") + ggtitle("HLA-B gene")
tsne_g_b1 <- plotReducedDim(redim_scae, dimred = which_tsne, colour_by = "B*51:01:01:01") + ggtitle("Allele B*51:01:01:01")
tsne_g_b2 <- plotReducedDim(redim_scae, dimred = which_tsne, colour_by = "B*57:01:01:01") + ggtitle("Allele B*57:01:01:01")

p3 <- tsne_g_b  + tsne_g_b1 + tsne_g_b2

p3


5.3.3 HLA-C immune gene and alleles

#EGA_hla_c_alleles 
tsne_g_g  <- plotReducedDim(redim_scae, dimred = which_tsne, colour_by = "HLA-C") + ggtitle("HLA-C gene")
tsne_g_c1 <- plotReducedDim(redim_scae, dimred = which_tsne, colour_by = "C*04:01:01:01") + ggtitle("Allele C*04:01:01:01")
tsne_g_c2 <- plotReducedDim(redim_scae, dimred = which_tsne, colour_by = "C*06:02:01:01") + ggtitle("Allele C*06:02:01:01")

p1 <- tsne_g_g  + tsne_g_c1 + tsne_g_c2 

p1

6 Session Information

sessionInfo()
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.6 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Europe/Berlin
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] viridis_0.6.4                         viridisLite_0.4.2                    
##  [3] cowplot_1.1.1                         patchwork_1.1.3                      
##  [5] lubridate_1.9.2                       forcats_1.0.0                        
##  [7] stringr_1.5.0                         dplyr_1.1.2                          
##  [9] purrr_1.0.2                           readr_2.1.4                          
## [11] tidyr_1.3.0                           tibble_3.2.1                         
## [13] tidyverse_2.0.0                       gridExtra_2.3                        
## [15] scater_1.28.0                         ggplot2_3.4.3                        
## [17] scran_1.28.2                          scuttle_1.10.2                       
## [19] SingleCellAlleleExperiment_0.0.0.9000 SingleCellExperiment_1.22.0          
## [21] SummarizedExperiment_1.30.2           Biobase_2.60.0                       
## [23] GenomicRanges_1.52.0                  GenomeInfoDb_1.36.2                  
## [25] IRanges_2.34.1                        S4Vectors_0.38.1                     
## [27] BiocGenerics_0.46.0                   MatrixGenerics_1.12.3                
## [29] matrixStats_1.0.0                     BiocStyle_2.28.1                     
## 
## loaded via a namespace (and not attached):
##   [1] rstudioapi_0.15.0         jsonlite_1.8.7           
##   [3] magrittr_2.0.3            magick_2.7.5             
##   [5] ggbeeswarm_0.7.2          farver_2.1.1             
##   [7] rmarkdown_2.24            zlibbioc_1.46.0          
##   [9] vctrs_0.6.3               memoise_2.0.1            
##  [11] DelayedMatrixStats_1.22.6 RCurl_1.98-1.12          
##  [13] htmltools_0.5.6           S4Arrays_1.0.5           
##  [15] progress_1.2.2            curl_5.0.2               
##  [17] BiocNeighbors_1.18.0      Rhdf5lib_1.22.0          
##  [19] rhdf5_2.44.0              sass_0.4.7               
##  [21] bslib_0.5.1               cachem_1.0.8             
##  [23] igraph_1.5.1              lifecycle_1.0.3          
##  [25] pkgconfig_2.0.3           rsvd_1.0.5               
##  [27] Matrix_1.6-0              R6_2.5.1                 
##  [29] fastmap_1.1.1             GenomeInfoDbData_1.2.10  
##  [31] digest_0.6.33             colorspace_2.1-0         
##  [33] AnnotationDbi_1.62.2      dqrng_0.3.0              
##  [35] irlba_2.3.5.1             RSQLite_2.3.1            
##  [37] org.Hs.eg.db_3.17.0       beachmat_2.16.0          
##  [39] labeling_0.4.3            filelock_1.0.2           
##  [41] fansi_1.0.4               timechange_0.2.0         
##  [43] httr_1.4.7                abind_1.4-5              
##  [45] compiler_4.3.1            bit64_4.0.5              
##  [47] withr_2.5.0               BiocParallel_1.34.2      
##  [49] DBI_1.1.3                 highr_0.10               
##  [51] R.utils_2.12.2            HDF5Array_1.28.1         
##  [53] biomaRt_2.56.1            rappdirs_0.3.3           
##  [55] DelayedArray_0.26.7       bluster_1.10.0           
##  [57] tools_4.3.1               vipor_0.4.5              
##  [59] beeswarm_0.4.0            R.oo_1.25.0              
##  [61] glue_1.6.2                rhdf5filters_1.12.1      
##  [63] grid_4.3.1                Rtsne_0.16               
##  [65] cluster_2.1.4             generics_0.1.3           
##  [67] gtable_0.3.4              tzdb_0.4.0               
##  [69] R.methodsS3_1.8.2         hms_1.1.3                
##  [71] BiocSingular_1.16.0       ScaledMatrix_1.8.1       
##  [73] metapod_1.8.0             xml2_1.3.5               
##  [75] utf8_1.2.3                XVector_0.40.0           
##  [77] ggrepel_0.9.3             pillar_1.9.0             
##  [79] limma_3.56.2              BiocFileCache_2.8.0      
##  [81] lattice_0.21-8            bit_4.0.5                
##  [83] tidyselect_1.2.0          locfit_1.5-9.8           
##  [85] Biostrings_2.68.1         knitr_1.43               
##  [87] bookdown_0.35             edgeR_3.42.4             
##  [89] xfun_0.40                 statmod_1.5.0            
##  [91] DropletUtils_1.20.0       stringi_1.7.12           
##  [93] yaml_2.3.7                evaluate_0.21            
##  [95] codetools_0.2-19          BiocManager_1.30.22      
##  [97] cli_3.6.1                 munsell_0.5.0            
##  [99] jquerylib_0.1.4           Rcpp_1.0.11              
## [101] dbplyr_2.3.3              png_0.1-8                
## [103] XML_3.99-0.14             parallel_4.3.1           
## [105] blob_1.2.4                prettyunits_1.1.1        
## [107] sparseMatrixStats_1.12.2  bitops_1.0-7             
## [109] scales_1.2.1              crayon_1.5.2             
## [111] rlang_1.1.1               KEGGREST_1.40.0